Bias elimination and recent probing studies attempt to remove specific information from embedding spaces. Here it is important to remove as much of the target information as possible, while preserving any other information present. INLP is a popular recent method which removes specific information through iterative nullspace projections. Multiple iterations, however, increase the risk that information other than the target is negatively affected. We introduce two methods that find a single targeted projection: Mean Projection (MP, more efficient) and Tukey Median Projection (TMP, with theoretical guarantees). Our comparison between MP and INLP shows that (1) one MP projection removes linear separability based on the target and (2) MP has less impact on the overall space. Further analysis shows that applying random projections after MP leads to the same overall effects on the embedding space as the multiple projections of INLP. Applying one targeted (MP) projection hence is methodologically cleaner than applying multiple (INLP) projections that introduce random effects.
translated by 谷歌翻译
Abbreviations present a significant challenge for NLP systems because they cause tokenization and out-of-vocabulary errors. They can also make the text less readable, especially in reference printed books, where they are extensively used. Abbreviations are especially problematic in low-resource settings, where systems are less robust to begin with. In this paper, we propose a new method for addressing the problems caused by a high density of domain-specific abbreviations in a text. We apply this method to the case of a Slovenian biographical lexicon and evaluate it on a newly developed gold-standard dataset of 51 Slovenian biographies. Our abbreviation identification method performs significantly better than commonly used ad-hoc solutions, especially at identifying unseen abbreviations. We also propose and present the results of a method for expanding the identified abbreviations in context.
translated by 谷歌翻译
参数单位识别和分类旨在从文本中识别参数单元,并将其分类为Pro或反对。为此任务开发系统时需要做出的设计选择之一是分类单位应有的:令牌或完整句子的段。先前的研究表明,与直接对句子进行培训相比,对令牌级别的微调语言模型可用于对句子进行分类的更强大的结果。我们重现了最初提出这一主张的研究,并进一步研究了与基于句子的系统相比,基于代币的系统学会更好。我们开发系统的测试,以分析基于令牌和基于句子的系统之间的行为差异。我们的结果表明,基于令牌的模型通常比手动扰动的示例和数据的特定亚群都比基于句子的模型更强大。
translated by 谷歌翻译
\ textbf {攻击性内容警告}:本文仅包含进攻性语言,仅用于提供阐明这项研究的示例,并且不反映作者的意见。请注意,这些例子是令人反感的,可能会导致您困扰。识别\ textit {仇恨言语}的主观性使其成为一项复杂的任务。 NLP中的不同和不完整的定义也反映了这一点。我们提出\ textit {仇恨言论}标准,以法律和社会科学的观点开发,目的是帮助研究人员创建有关五个方面的更精确的定义和注释指南:(1)目标群体,(2)优势,(3)(3)肇事者特征,(4)否定组参考的类型和(5)潜在后果/效果的类型。可以对定义进行构建,从而涵盖更广泛或更狭窄的现象。因此,可以在指定标准或使其打开的情况下做出有意识的选择。我们认为,目标开发人员的目标和确切的任务应确定\ textit {仇恨言语}的范围的定义。我们从\ url {hatespeechdata.com}概述了英语数据集的属性,该属性可能有助于为特定方案选择最合适的数据集。
translated by 谷歌翻译
尽管取得了成功,但现代语言模型很脆弱。甚至在训练管道的小变化也会导致意外结果。我们通过检查Albert(Arxiv:1909.11942)与随机体重平均(SWA)(ARXIV:1803.05407)组合的鲁棒性来研究这种现象 - 一种廉价的合奏方式 - 在情绪分析任务(SST-2)上。特别是,我们通过清单标准(ARXIV:2005.04118)分析SWA的稳定性,检查了模型仅在随机种子中不同的误差协议。我们假设SWA更稳定,因为它集成了沿梯度血迹轨迹的模型快照。通过将模型的错误与Fleiss'Kappa(Fleiss,1971)和重叠比率分数进行比较来量化稳定性。我们发现SWA通常会降低错误率;然而,模型仍然遭受自己独特的偏见(根据清单)。
translated by 谷歌翻译